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Abstract 

. In this paper, we propose a low-rank approximation method based on discrete least-squares 

■ for the approximation of a multivariate function from random, noisy-free observations. Spar- 
sity inducing regularization techniques are used within classical algorithms for low-rank ap- 

■ proximation in order to exploit the possible sparsity of low-rank approximations. Sparse 
h-^ ■ low-rank approximations are constructed with a robust updated greedy algorithm which in- 

_ eludes an optimal selection of regularization parameters and approximation ranks using cross 

(~| ■ validation techniques. Numerical examples demonstrate the capability of approximating func- 

tions of many variables even when very few function evaluations are available, thus proving 
the interest of the proposed algorithm for the propagation of uncertainties through complex 
computational models. 



^ '. 1 Introduction 



Uncertainty quantification has emerged as a crucial field of investigation for various branches 

■ of science and engineering. Over the last decade, considerable efforts have been made in the 
, development of new methodologies based on a functional point of view in probability, where random 
• outputs of simulation codes are approximated with suitable functional expansions. Typically, 
, when considering a function m(^) of input random parameters ^ = (^i . . -Cd), an approximation 

is searched under the form u{^) « X]i=i^j'^«(0 where the constitute a suitable basis of 

multiparametric functions (e.g. polynomial chaos basis). 
. ; Several methods have been proposed for the evaluation of functional expansions (see [19, 22, 

r> , 17]). Non intrusive discrete projection methods allow the estimation of expansion coefficients by 

■ using evaluations of the numerical model at certain sample points, thus allowing the simple use 
of existing deterministic simulation codes. However the dimension P of classical approximation 
spaces has an exponential (or factorial) increase with dimension d and hence the computational 
cost becomes prohibitively high as one needs to evaluate the model for a large number of samples 
of the order of P. The objective is to construct an approximation of the high dimensional function 
M, given the fact that we have only limited information on it. We are particularly interested in 
the case where the dimension d is large but the "effective dimensionality" of the function is fairly 
small. 

In order to handle high-dimensional models, we here propose a low rank tensor approxima- 
tion method using a discrete least-squares approach, which exploits the tensor structure of the 
stochastic function spaces and the possible sparsity of low rank approximations. The underlying 
assumption is that the model output functional can be well approximated using sparse low-rank 
tensor approximations. 
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Low rank approximation methods have recently been applied to many areas of scientific com- 
puting for approximating elements in high dimensional tensor spaces [15, 13, 12, 10], with several 
applications in uncertainty propagation [21, 6, 23, 14, 20]. In the context of uncertainty quantifica- 
tion, for problems involving very high stochastic dimension d, instead of evaluating the coefficients 
of an expansion in a given approximation basis (e.g. polynomial chaos), function u is approximated 
in suitable low-rank tensor subsets of the form 

M = [v = Fm (p'^' , . . . , p^'')); pf'^) € R"" } 

where Fm is a multilinear map which constitutes a paramctrization of the subset M and the 
p^*^) are the parameters. Low rank tensor subsets have nice approximation properties in the sense 
that they are able to approximate with a good precision a large class of functions that can be 
encountered in practical applications. The dimension of the paramctrization X]fe=i "fe typically 
grows linearly with the dimension d, thus making possible approximation in high dimension. Here, 
we rely on least-squares methods in order to construct approximations in these tensor subsets, 
using only sample evaluations of the hmction u. Methods based on least-squares have already been 
proposed in [2, 24, 8] for the construction of tensor approximations of multivariate functionals. 
Here, we propose an alternative construction of tensor approximations using greedy algorithms 
and least-squares methods with sparse rcgularization allowing the construction of sparse low rank 
approximations with only a few function evaluations. 

The proposed method consists in approximating the model with a m-term representation 
UmiS,) = where the ai arc real coefficients and where the Wi are successively se- 

lected in a sparse low-rank (typically rank-one) tensor subset, ideally 

^m-sparse = |^ = Fm{P^^\ ■ ■ • , P^'^'); P^'^' € M"^ Hp^^^ ||o < mfe} 

where || • ||o is the zero "norm" counting the number of non zero coefficients. Although _A/('""®p'^''^'^ 

may have a very small effective dimension X]fc=i X]fc=i '^fe' ^^e structure of this set makes 

optimization in _yv4™'"''P^'^''"' a combinatorial problem. Therefore, we replace the ideal sparse tensor 
subset by 

M^ = {v = Fm{p^'\. ■ . , P^'')); P^'^^ e M"^ llpW 111 < 7fe} 

by introducing a convex rcgularization of the constraints using ^i-norm. In practice, optimal 
approximations in subset are computed using an alternating minimization algorithm that 
exploits the specific low dimensional paramctrization of the subset M'^ and that involves the solu- 
tion of successive least-squares problems with sparse ^i-regularization. Cross validation techniques 
are introduced in order to select optimal rcgularization parameters 7. The progressive (greedy) 
construction of the m-term representation has the advantages of being adaptive and also of re- 
ducing the dimension of successive least-squares problems, thus improving the robustness of the 
least-squares method when only few samples are available. As a result, the proposed technique 
allows to approximate the response of models with a large number of random inputs even with a 
limited number of model evaluations. A sparse rcgularization technique is also used in order to 
retain only the most significant functions Wi , which results in an improvement of robustness of the 
greedy construction when only a limited number of samples are available. In this paper, we restrict 
the presentation to the case where successive corrections are computed in the set ^A of rank-one 
tensors. The extension of the methodology to other low-rank tensor subsets is straightforward. 

The outline of the paper is as follows. In section 2, we introduce some basic concepts about 
functional approaches in uncertainty propagation. We also detail methods based on least-squares 
for the computation of approximate functional expansions. In section 3, we introduce the proposed 
sparse low rank approximation method based on regularized least-squares. Finally the ability of 
the proposed method to detect and exploit low rank and sparsity of functions is illustrated on 
numerical applications in section 4. 
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2 Functional representation and least-squares methods 

2.1 Stochastic function spaces and their tensor structure 

We here introduce the definitions of stochastic functions spaces and approximation spaces. We 
consider a set ^ of d random variables and we denote by (S, the associated probability 

space, where H C M'^ and where is the probability law of ^. Wc suppose that ^ can be split into 
r mutually independent sets of random variables {^k}k=n i-<3- C = {^i; • • • where takes 
values in Ek C M**^. Wc have d = X]fc=i ^fc- denote by {Ek,Bk,P^k) the probability space 
associated with ^fc, with Pj^ the probability law of ^k- Therefore, the probability space (S, B, P{) 
associated with ^ = (^i, ■ ■ ■ , ^r) has a product structure with S = x^^j^Sfc and P^ = (8i^=iP{fc ■ 

We denote by Lp^ (5) the Hilbert space of second order random variables defined on (S, B, P^), 
defined by 

L\{E) = |u : y e S ^ u{y) e M; ^ u{yfP^{dy) < ooj , 
which is a tensor Hilbert space with the following tensor structure: 

L%, (S) = L%^^ (Si) g) ... L%^^ (Er). 

We now introduce approximation spaces iS^^ C Lp^ (S^) with orthonormal basis {(/>j'^^}J=i, such 
that 

where v^*'^ denotes the vector of coefficients of v^''^ and where 0^*^^ = {4>i'\ . . . , (prik)'^ denotes the 
vector of basis functions. An approximation space Sn C I/p^ (S) is then obtained by tensorization 
of approximation spaces <S^^ : 

where /„ = x^^ Jl ...nk} and = {(f)^^^ (g) . . . (g) . . . ,2/^) = (/'l^^'(yi) • • • (f^t^iVr)- An 

element v = G Sn can bo identified with the algebraic tensor v G ® . . . ® M"'' such that 

(v)i = Denoting (j){y) = <p^^\yi) ig) . . . ig) ^^''\yr) € (g) . . . (g)M"'-, we have the identification 
Sn ^ K"^ ... M"'- with 

'Sr^ = {v{y) = v); V G M"^ ® . . . ® M"-^} , 

where (•, •) denotes the canonical inner product in (g) . . . ® R""^. 

Here, we suppose that the approximation space Sn is given and sufficiently rich to allow ac- 
curate representations of a large class of functions (e.g. by choosing polynomial spaces with high 
degree, wavelets with high resolution...). Then, the aim is to provide a method for the approxima- 
tion of functions in Sn for high dimensional applications and using only limited information on the 
functions. Note that in the case r = d, approximation space Sn has a dimension 11^=1 ''^k which 
grows exponentially with the dimension d, thus making impossible the numerical representation 
and computation of an element v G Sn for high dimensional applications. In order to reduce the 
dimensionality, a classical strategy consists in introducing approximation subspaces Sn,p C Sn 
that arc constructed using suitable tensorization rules: Sn.p = {f^ = X^ie/ ""i^i '■ '-"j ^ ^l; where 
In,p C In is an index set which can be chosen a priori. For r = d, a, typical construction consists 
in taking for S^^ the space of degree p polynomials Pp(Sfe), with (/jj*-* the orthogonal polynomial 
of degree j — 1, and for In,p = {i G In', X^fc=i(«/c ~ 1) < p}- Thus, Sn,p appears to be the so called 
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polynomial chaos composed of multidimensional polynomials with total degree less than p [11, 26]. 
Other tensorization strategies have been proposed, based on a priori knowledge on the solution 
or based on adaptive strategies. In the present work, we are interested in alternative methods 
that try to approximate high dimensional functions using low-rank approximations. They will be 
introduced in section 3. 



2.2 Least-squares methods 

We here consider the case of a real- valued model output u : S — > M. We denote by {y^}^=i C S a 
set of Q samples of ^, and by {u{y'')}^^i C R the corresponding function evaluations. We suppose 
that an approximation space Sp = span{(f>i}fLi is given. Classical least-squares method for the 
construction of an approximation up G Sp then consists in solving the following problem: 

1 

\\u - Up fg = mm \\u - v\\l with Ml = -J^uiy^f ■ (1) 

^ ^ 9=1 

Note that || • ||q only defines a semi-norm on ip^(S) but it may define a norm on the finite 
dimensional subspace Sp if we have a sufficient number Q of model evaluations. A necessary 
condition is Q > P. However, this condition may be unreachable in practice for high dimensional 
stochastic problems and usual a priori (non adapted) construction of approximation spaces Sp. 
Moreover, classical least-squares method may yield bad results because of ill-conditioning (solution 
very sensitive to samples). A way to circumvent these issues is to introduce a regularized least- 
squares functional: 

j\v) = \\u-vfQ + XCiv), (2) 

where £ is a regularization functional and where A refers to some regularization parameter. The 
regularized least-squares problem then consists in solving 

J^{u^) = min J^{v). (3) 
veSp 

Denoting by v = (wi, . . . ,vp)^ € the coefficients of an element v = J2iLi '^i4>i € Sp, we can 
write 

\\u-v\\l = \\7^-^v\\l (4) 

with z = {u{y^), . . . ,u{y^)Y' E the vector of random evaluations of ?i(^) and $ G MP^^ 
the matrix with components = 4'i{y'^). We can then introduce a function L : — >■ R 

such that £iJ2iVz<i)i) = i(v), and a function : R^ ^ R such that J^{J2i'"i(l'i) = J^i'^) = 
II z — *v||2 + AZ/(v). An algebraic version of least-squares problem (3) can then be written as 
follows: 

min ||z - *v||2 -|- Ai(v). (5) 

Regularization introduces additional information such as smoothness, sparsity, etc. Under some 
assumptions on the regularization functional C, problem (3) may have a unique solution. However, 
the choice of regularization strongly infiuences the quality of the obtained approximation. Another 
significant component of solving (5) is the choice of regularization parameter A. In this paper, we 
use cross validation for the selection of an optimal value of A. 



2.3 Sparse regularization 

Over the last decade, sparse approximation methods have been extensively studied in different sci- 
entific disciplines. A sparse function is one that can be represented using few non zero terms when 
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expandcid on a suitable basis. In the context of uncertainty quantification, if a stochastic function 
is known to be sparse on a particular function basis, e.g. polynomial chaos (or tensor basis), sparse 
regularization methods can be used for quasi optimal recovery with only a few sample evaluations. 
In general, a successful reconstruction of sparse solution vector depends on sufficient sparsity 
of the coefficient vector and on additional technical properties (e.g. incoherence). This strategy 
has been found to be effective for non-adapted sparse approximation of the solution of PDEs [3, 7]. 

More precisely, an approximation X^ili Ui(t>i{C) of a function w(^) is considered as sparse on a 
particular basis {<j>i{C}}i=i if it admits a good approximation with only a few non zero coefficients. 
Under certain conditions, a sparse approximation can be computed accurately using only Q <^ P 
random samples of u{^) via sparse regularization. 

Given the random samples z e M9 of the function u{^), a best m-sparse (or m-term) approxi- 
mation of u can be ideally obtained by solving the constrained optimization problem 

min llz — ^vllo subject to ||v||o < m, (6) 

where ||v||o = #{i € {1, . . . , P} : Vi ^ 0} is the so called £o-"iiorm" of v which gives the number 
of non zero components of v. Problem (6) is a combinatorial optimization problem which is 

NP hard to solve. Under certain assiimptions, problem (6) can be reasonably well approximated 
by the following constrained optimization problem which introduces a convex relaxation of the 
Iq- "norm" : 

mm||z-*v||| subject to ||v||i < 5, (7) 

where ||v||i = X^^^ \vi\ is the ^i-norm of v. Since the £2 and ^i-norms are convex, we can 
equivalently consider the following convex optimization problem, known as Lasso [25] or basis 
pursuit [5]: 

min ||z-*v||i + A||v||i, (8) 

where A > corresponds to a Lagrange multiplier whose value is related to S. Problem (8) appears 
as a regularized least-squares problem. The £i-norm is a sparsity inducing regularization function 
in the sense that the solution v of (8) may contain components which are exactly zero. Several 
optimization algorithms have been proposed for solving (8) (see [1]). In this paper, we use the 
Lasso modified least angle regression algorithm (see LARS presented in [9]) that provides a set of 
Nr solutions, namely the regularization path, with increasing £i-norm. Let v-' , with j = 1, . . . , Nr, 
denote this set of solutions, Aj c {1, . . . , P} be the index set corresponding to non zero coefficients 
of v-' , v^. e the vector of the coefficients Aj of v^ , and G M'^^#'^j the submatrix of $ 

obtained by extracting the columns of $ corresponding to indices Aj. The optimal solution v is 
then selected using the fast leave-one-out cross validation error estimate [4] which relies on the use 
of the Sherman-Morrison- Woodbury formula (sec [3] for its implementation within Lasso modified 
LARS algorithm). Algorithm 1 briefly outlines the cross validation procedure for the selection of 
the optimal solution. In this work, we have used Lasso modified LARS implementation of SPAMS 
software [18] for ^i-regularization. 
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Algorithm 1 Algorithm to determine optimal LARS solution using leave-one-out cross validation. 
Input: sample vector z €E and matrix ^ S ]R*3xP 
Output: vector of coefficients v G 
1: Run the Lasso modified LARS procedure to obtain Nr solutions v^, . . . , v^'' of the regular- 
ization path, with corresponding sets of non zeros coefficients Ai, . . . ,An^. 

2: for j = 1,. .. ,Nr do 

3: Recompute the non zero coefficients v^^ of v-' using ordinary least-squares: 

v\. = argmin^gjj#A, ||z - *A,v||i 
4: Compute hq = {^Aji^^^^Aj)~^^'Aj)qq using Sherman-Morrison- Woodbury formula. 

5: Compute relative leave-one-out error Cj = ^ Ylq=i ^\i-h jaCz/^" ) ' '^li^'^s ^{^) the 

empirical standard deviation of z. 
6: end for 

7: Return the optimal solution v such that vaj, = v^ „ with j* = argmiuj ej. 



3 Sparse low-rank tensor approximations based on least- 
squares method 

The aim is to find a sparse low rank approximation of a function u{^) in the finite dimensional 
tensor space Sn = S^^ (g) . . . (g) 5^ . The proposed method relies on a greedy algorithm where 
successive corrections of the approximation are computed in small low-rank tensor subsets using 
least-squares with sparsity inducing regularization. Here, we restrict the presentation to the case 
where successive corrections are computed in the elementary set of rank-one tensors TZ\, thus 
resulting in the construction of a low-rank canonical approximation of the solution. However, the 
methodology could be naturally extended in order to construct sparse representations in other 
tensor formats. 

3.1 Sparse canonical tensor subsets 

Let TZi denote the set of (elementary) rank-one tensors in Sn = ... <S^^ , defined by 
or equivalently by 

TZi = {w;(y) = (^(y), w^^) ® . . . (g) w^'')); w^'^) G M"'= } , 

where (p{y) = 4>^^\yi) (g) . . . ig) <p^'^\yr), with cf)^''^ = {(t>^i \ . . . , 4>n})'^ the vector of basis functions 
of , and where w^'^^ = (wf , . . . , w^^)^ is the set of coefficients of w^''^ in the basis of S!^^ , that 

means w^'''>{yk) = ^"=1 Wi(l>f\yk)- 

Approximation in 1Z\ using classical least-squares methods possibly enables to recover a good 
approximation of the solution using a reduced number of samples. However, it may not be suf- 
ficient in the case where the approximation spaces S^^ have high dimensions n^, thus resulting 
in a manifold of rank-one elements T^i with high dimension Y^^^^Uk- This diflaculty may be 
circumvented by introducing approximations in a m-sparse rank-one subset defined as 

^m-sparse ^ j^^^^ ^ (0(2/), W^^) . . . ® W^); w(*=) G M"^ ||wW ||o < mfc} 

with effective dimension 'Yl'k=i''^k Sfc=i"'ft- mentioned in section 2.3, performing least- 
squares approximation in this set may not be computationally tractable. We thus introduce a 
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convex relaxation of the £o- "norm" to define the subset TZj of TZi defined as 

7^7 = {w{y) = (0(2/), w(i) ® . . . ® wW); w« e M"^ Hw^^^^Hi < 7^} , 

where the set of parameters (w'^-'^'', . . . , w'^'"'') is now searched in a convex subset of R""^ x . . . x M"''. 
Finally, we introduce the set of canonical rank-m tensors TZm = {v = Wi ;wi € TZi } and the 

corresponding subset 




In the following, we propose algorithms for the construction of approximations in tensor subsets 
TZj and TZ'^''"'''"' ■ These subsets are low-dimensional subsets of the approximation space Sn 
but are not linear spaces nor convex sets, thus making more difficult the analysis and practical 
resolution of optimization problems in these sets. In practice, we rely on heuristic alternating 
minimization algorithms presented in the following sections. 

Remark 1 Other tensor subsets have been introduced that have better approximation properties, 
such as Tucker tensor sets or Hierarchical Tucker tensor sets (see [13]). These tensor formats are 
not considered here. 



3.2 Construction of speirse rank-one tensor approximation 

The subset 7^7 can be parametrized with the set of parameters (w^^^ . . . , w^''^) e M"i x . . . x M"'' 
such that II w'^'^) ||i < 7/0 (fc = 1, . . . , r), this set of parameters corresponding to an element w = 
®]^^iW^^'> where w^*^) denotes the vector of coefficients of an element w^'^). With appropriate 
choice of bases, a sparse rank-one function w could be well approximated using vectors w^'^) with 
only a few non zero coefficients. 

We compute a rank-one approximation w = 'S'^^-iW^''^ G TZj of v by solving the least-squares 
problem 



mm \\v - w\\q = mm -((/), w'^' (g) ... ig) w^""))!]?,. (9) 

wenj ^ w(i'eE"i,...,w('^'eR"'- 

''''l|l<7l,-,l|w<"'||i<7r- 



Problem (9) can be equivalently written 

r 

V - (0, w(i) ® . . . wW)||^ + J2 ^fcllw^'^ 111' (10) 



mm 

i^(i)6R''i,...,w('-)6i 



fe=l 



where the values of the regularization parameters Afc > (interpreted as Lagrange multipliers) 
are related to jk- In practice, minimization problem (10) is solved using an alternating minimiza- 
tion algorithm which consists in successively minimizing over w^-'^ for fixed values of {w^''^}i~^j. 
Denoting by z S MP the vector of samples of function v{^) and by ^^■'^ € IK'S x "3 ^he matrix 
whose components are {^^■'^)qi = (p{{yj) Ylkjtj '^^''^vD' minimization problem on w^^^ can be 
written 

min ||z-*(^')w(^)||^-^Aj||w(^')||i, (11) 

which has a classical form of a least-squares problem with a sparsity inducing ^i-regularization. 

Problem (11) is solved using the Lasso modified LARS algorithm where the optimal solution is 
selected using the leave-one-out cross validation procedure presented in Algorithm 1. Algorithm 
2 outlines the construction of a sparse rank one approximation. 
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Algorithm 2 Algorithm to compute sparse rank one approximation of a function v. 

Input: vector of evaluations z = {v{y^), .... v{'iP)Y' e M'^. 
Output: rank-one approximation w{y) = {<f>{y),w'-^\ . . . , w^'"'). 
1: Initialize the vectors {w''^)}^^^ and set / = 0. 

2; I + 

3: for j = 1, ... ,r do 
4: Evaluate matrix 

5: Solve problem (11) using Algorithm 1 for input z G M9 and ^^■'^ e M'^xnj obtain w^-'^. 
6: end for 

7: Compute z = {w{y^), . . . , w{y^)Y' . 
8: if ||z — z||2 > e and i < Zmaa; then 
9: Go to Step 2. 
10: end if 

11: Return the solution parameters w^^\ . . . , w^''^. 



Remark 2 (Other types of regularization) Different rank-one approximations can he defined 
by changing the type of regularization and constructed by replacing step 5 of Algorithm 2. First, one 
can consider a simple least-squares without regularization, namely Ordinary Least Square (OLS), 
by replacing step 5 by the solution of 

min llz-^^^V^-^'^lli. (12) 

w(3)eK'*3 



Also, one can consider a regularization using £2-norm (ridge regression) by replacing step 5 by the 
solution of 

min ||z-*(^'w(^')||2 + A,-||w(^')||2 (13) 



with a selection of optimal parameter Xj using standard cross-validation (typically k-fold, cross- 
validation). The approximations obtained with these different variants will be compared in the 
numerical examples of section 4- 



3.3 Updated greedy construction of sparse rank-M approximation 

We now wish to construct a sparse rank-M approximation um G 'R-m of u of the form um = 
'^m=i '^mWm by successive computations of sparse rank-one approximations Wm = 'Si^^iWm^ . We 
start by setting uq = 0. Then, knowing an approximation Um-i of u, we proceed as follows. 



Sparse rank-1 correction step. We first compute a correction Wm € TZi of Um-i by solving 

min \\u - Um-i - w\\q, (14) 

which can be reformulated as 

r 

min ||u-u„_i-(0,wW®...®wM)||^ + y;Afe||wW||i. (15) 

k=l 

Problem (15) is solved using an alternating minimization algorithm, which consists in successive 
minimization problems of the form (11) where z e M9 is the vector of samples of the residual 
{u — Um-i){0- Optimal parameters {Afc}^^^ is selected with cross-validation. 
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Updating step. Once a rank-one correction Wm has been computed, it is normalized and the 
approximation Um = Y^^i '^i'^i is computed by solving a regularized least-squares problem: 

m 

min Wu-'^aiWiWl+X'WaWi. (16) 

i—1 

This updating step allows a selection of significant terms in the canonical decomposition, that 
means when some ai are found to be negligible, it yields an approximation Um = J^iLi cu'Wi 
with a lower effective rank representation. The selection of parameter A' is also done with a 
cross-validation technique. 

Remark 3 Note that an improved updating strategy could be introduced as follows. At step m, 
denoting by Wi = 'Ei^^i'wl , 1 < i < m, the computed corrections, we can define approximation 
spaces = span{wl''^}^i C S!^^ (with dimension at most m), and look for an approximation 

of the form Um = YlTx=i ■ ■ ■ Y^T,.=i "ii.-.v '2'fc=i '"^1^^ G '^k=i^m (namely Tucker tensor format). 
The update problem then consists in solving 

, XX E "n-v®U4'^llQ + A'||a||i, (17) 

where ||a||i = ^Ylii^ % |c«ii...iT-l- This updating strategy can yield significant improvements of 
convergence. However, it is clearly unpractical for high dimension r since the dimension of 

the representation grows exponentially with r. For high dim,ension, other types of representations 
should be introduced, such as hierarchical tensor representations. 

Algorithm 3 details the updated greedy construction of sparse low rank approximations. 

Algorithm 3 Updated greedy algorithm for sparse low rank approximation of a function u. 

Input: vector of evaluations z = (u(y^), . . . , u{y^))'^ G K*3 and maximal rank M. 

Output: Sequence of approximations Um = Yll^i ctiWi, where Wi e TZi and a = (ai, . . . , a^) S 

1: Set Wo = 

2: for m = 1, . . . ,M do 

3: Evaluate the vector z^-i = {urn-i{y^), ■ ■ ■ ,nm-i{y'^))'^ G K."^ 

4: Compute a sparse rank-one approximation Wm = using Algorithm 2 for input 

vector of evaluations z — Zm-i- 

5: Evaluate matrix W E R*?^™ with components {W)qi = Wi{y'^). 

6: Compute a G M™ with Algorithm 1 for input vector z G M"^ and matrix W G R'^^'". 
7: end for 



Rank Selection. Let us note that Algorithm 3 not only generates an approximation of the 
function but a sequence of approximations {um}m=i ^ith increasing canonical ranks. To select 
the best rank, we use a fc-fold cross validation method. The overall procedure is as follows: 

• Split sample set S = {1, . . . , Q} into k disjoint subsamples {V^j^^i, 1^ C 5, of approximately 
the same size, and let Si = S\Vi. 

• For each subsample, run Algorithm 3 from the reduced vector of evaluations Z5. (test set) 
to obtain the sequence of model approximations {w^'}i<m<M- Compute the corresponding 
mean squared errors {ef', . • . ,£m} from the validation set of evaluations zy.. 

• For m = 1, . . . , M, compute the fc-fold cross validation error e„ = ^ Y^i=i ^m- 

• Select optimal rank ruop = avgmmi^^^j^em- 

• Run Algorithm 3 with the whole data set z for computing u^ap- 
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4 Application examples 

4.1 Analytical model: Checker- board function 

Function and approximation spaces. We first test Algorithm 3 on the so-called checker- 
board function m(^i,^2) illustrated in figure 1. Random variables and ^2 are independent 
and uniformly distributed on [0, 1]. This function has a rank 2 and the functions of the rank-2 
decomposition are plotted in figure 2. 
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Figure 1: Checker-board function ^(^1,^2) = J2i=i'^i te) (See Fig. 2). 
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Figure 2: Functions of the decomposition of the checker-board function ^(^1,^2) 



E-=i^r^(ei)«^r(6) (rank-2) 



(2), 



For approximation spaces S^^, k G {1,2}, we introduce piecewise polynomials of degree p 
defined on a uniform partition of 5/j composed by s intervals, corresponding to Uk = s{p + 1). 
We denote by 5^ = Pp,^ the corresponding space (for ex. P2,3 denotes piecewise polynomials of 
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degree 2 defined on the partition {(0, |), (i, |), (|, 1)}). We use an orthonormal basis composed 
of functions whose supports is one element of the partition and whose restriction on this support 
is a rescaled Legendrc polynomial. 

Note that when using a partition into s — 6n intervals, n S N, then the checker-board function 
can be exactly represented, that means u € Pp,6n ® ^p,6n for all p and n. Also, the solution admits 
a sparse representation in Pp,6n <8) Pp,6n since an exact representation is obtained by only using 
piecewisc constant basis functions {u € Po,6n •S'Po.en)- The effective dimensionality of the ckecker- 
board function is 2 x 2 x 6 = 24, which corresponds to the number of coefficients required for 
storing the rank-2 representation of the function. We expect our algorithm to detect the low-rank 
of the function and also to detect its sparsity. 



Results. Algorithm 3 allows the construction of a sequence of sparse rank-m approximations 
Um in iS^j (8) 5^2- estimate optimal rank-mop using 3-fold cross validation (See rank selection 
in Section 3.3). 

In order to illustrate the accuracy of approximations in sparse low rank tensor subsets, we 
compare the performance of £i-regularization within the alternating minimization algorithm (step 
4 of Algorithm 3) with no regularization (OLS) and the ^2-i'egularization (see Remark 2 for the 
description of these alternatives). We introduce the relative error e{um, u) which is estimated with 
Monte Carlo integration with Q' = 1000 samples: 

eiu^,u) = tl^^. (18) 

II^IIq' 

Table 1 shows the error e{urn„p,u) obtained for the selected optimal rank rriop, without and with 
updating step 6 of Algorithm 3, and for the different types of regularization during the correction 
step 4 of Algorithm 3. The results are presented for a sample size Q = 200 and for different 
function spaces S^^^ = Sf^^ = Pp,.,. P denotes the dimension of the space ® iS^^- We observe 
that, for P2,3, the solution is not sparse on the corresponding basis and £i-regularization does 
not provide a better solution than ^2-regularization since the approximation space is not adapted. 
However, when we choose function spaces that arc sufficiently rich for the solution to be sparse, 
we see that £i-regularization within the alternating minimization algorithm outperforms other 
types of regularization and yields low rank approximations of the function almost at the machine 
precision. In this example, ^i-regularization allows recovering the exact rank-2 approximation of 
the function. 



Table 1: Relative error e(w„^j^,M) and optimal rank m.op estimation of Checker-board function 
with various regularizations for Q = 200 samples. P is the dimension of the approximation space. 
('-' indicates that none of the rank-one elements were selected during the update step). 
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No update Update 



No update Update 



No update 



Update 



Approximation space 



0.507 




2 
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12 
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1.22 10" 


12 
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7.9310- 


13 
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1.21 10" 


12 


2 


7.88 to- 


13 


2 



7^m(P2.3 ® IP2,3),P = 9^ 
7^m(P2.6®P2,6),P=182 
7^m(P2.12'S'P2.12),P = 362 
TlmiP5,6 <SlP5,6),P = 36^ 
7^m(PlO,6lS)PlO,6),P = 662 



0.527 2 0.527 2 

0.664 2 0.664 2 

20.92 1 

31.27 1 

9648.8 1 



0.508 2 0.508 2 

0.061 8 0.061 8 

0.568 10 0.566 4 

0.624 10 0.623 3 

0.855 10 0.855 10 



Prom this first analytical example, several conclusions can be drawn: 

• i'l-regularization in alternating least square algorithm is able to detect sparsity and hence 
gives very accurate approximations using few samples as compared to OLS and ^2-rcgularizations. 

• Updating step selects the most pertinent rank-one elements and gives an approximation of 
the function with a lower effective rank. 
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4.2 Analytical model: Rastrigin function 

For certain classes of non smooth functions, classical polynomial chaos bases may not be a good 
choice since they have global support and hence cannot capture local features accurately. One 
possible remedy is to use wavelet bases that allow the simultaneous description of global and local 
features [16]. In order to demonstrate the use of our algorithm with polynomial wavelet bases, we 
study the 2-dimensional Rastrigin function given by 

2 

/(O = 20 + ^(e,'-10cos(2^e.)) 

1=1 

where 1^1,^2 are independent random variables uniformly distributed in [-4,4]. The 2-D Rastrigin 
function is shown in figure 3. 




Figure 3: 2-D Rastrigin function. 
We consider two types of approximation spaces 5^^, k e {1, 2}: 

• spaces of polynomials of degree 7, using Legendre polynomial chaos basis, denoted P7, 

• spaces of polynomial wavelets with degree 4 and resolution level 3, denoted W4,3. 

We compute a sequence of sparse rank-m approximations Um in (g) S^^ using Algorithm 
3. We compare the approximations obtained with the different types of regularization during the 
correction step 4 of Algorithm 3: £i-regularization, £2-regularization and no regularization (OLS). 
For each case, an optimal rank approximation Wm„p is selected using 3-fold cross validation (see 
the rank selection strategy in section 3.3). Figure 4 shows the convergence of this optimal approx- 
imation with respect to the sample size Q for the two different approximation spaces and different 
regularizations within the alternated minimization algorithm of the correction step. We find that 
the solution obtained with classical polynomial basis functions is inaccurate and does not improve 
with increase in sample size. Thus, polynomial basis functions are not a good choice to obtain a 
reasonably accurate estimate. On the other hand, when we use wavelet approximation bases, the 
approximation error reduces progressively with increase in sample size. Also, ^i-regularization is 
more accurate when compared to OLS or €2-i'egularization, particularly for few function evalua- 
tions. We can thus conclude that a good choice of basis functions is important in order to fully 
realize the potential of ^i-regularization in the tensor approximation algorithm. 
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Figure 4: Evolution of error u) with respect to the number of samples Q. Approximations 

obtained with Algorithm 3 for different regularizations in alternating minimization algorithm (ii- 
regularization, €2-regularization, or no regularization (OLS)) with optimal rank selection, and for 
the two different approximation spaces P7 (8> P7 {P = 64) and W4^3 W4,3 (P = 1600). 

Figure 5 shows the convergence of the approximation obtained with Algorithm 3 using different 
sample sizes. We find that as the sample size increases, we get better approximations with increas- 
ing rank. We also observe the superiority of £i-regularization compared to the i?2-regularization, 
with a significant gain in terms of accuracy, especially when a small number of function evaluations 
are available. 




10" 



(a) 



10 ■ 








■e-50 

B - 100 

■X- 200 
- 500 
❖ - 1000 



-0- 0- ■ 



3 4 
Rank 

(b) 



Figure 5: Evolution of error u) with respect to rank m for different sample sizes. Wavelet 

approximation space W4^3 (8) W4^3 (P — 1600). Approximations obtained with Algorithm 3 with 
^i-regularization in update step and (a) ^i-regularization or (b) ^2-i'egularization in correction 
step. 



We finally analyze the robustness of Algorithm 3 (with £i-regularization within the alternated 
minimization algorithm) with respect to the sample sets. We use wavelet bases. An optimal 
rank approximation Umap is selected using 3-fold cross validation (see the rank selection strategy 
in section 3.3). We compare this algorithm with a direct sparse least-squares approximation in 
the tensorized polynomial wavelet space (no low-rank approximation), using i?i-regularization (use 
of Algorithm 1). Figure 6 shows the evolution of the relative error with respect to the sample 
size Q for these two strategies. The vertical lines represent the scattering of the error when 
different sample sets are used. We observe a smaller variance of the obtained approximations 
when exploiting low-rank representations. This can be explained by the lower dimensionality 
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of the representation, which is better estimated with a few number of samples. On this simple 
example, we see the interest of using low-rank representations when only a small number of samples 
is available. The interest of using low-rank representations should also become clear when dealing 
with higher dimensional problems. 
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Figure 6: Evolution of error £(umop,i*) with respect to sample size Q. (Red line) approximation 
obtained with direct least-squares approximation with ^i-regularization on the full polynomial 
wavelet approximation space W4.3 W4.3, (blue line) approximation obtained with Algorithm 3 
(with i!i-regularization) and with optimal rank selection. 

4.3 Diffusion equation with multiple inclusions 

We consider a stationary diffusion problem defined on a two dimensional domain r2 = (0,l)x(0,l) 
(see figure 7): 

— V- (kVu) = Id{x) on f2, u = on dft, 

where D — (0.4,0.6) x (0.4,0.6) C 51 is a square domain and Id is the indicator function of D. 
The diffusion coefficient is defined by 

^ Uk onCfc, 1 <fc<8 
\l onf7\(utiCfe) ' 

where the Ck are circular domains and where the ^ U{0.9, 1.1) are independent uniform random 
variables. We define the quantity of interest 

l{u){0 = I uix^Odx. 
Jd 

4.3.1 Influence of tensor format 

We exploit two different tensor structures for the construction of low-rank representations (by 
regrouping some variables). First, we consider the finest tensor structure iS^ (8) . . . (8) iS* . Secondly, 
we consider the tensor structure S^^''^'^'^^ CS) vS^^'^'^'*^ where iS*^^'^'"^'^' and 5*^^'^'^'*^ are spaces of 
functions of 4 variables. We introduce polynomial approximation spaces of total degree p = 2, 3 or 
5. In the first case, a rank-one function is the product of eight one-dimensional polynomials with 
degree p. In the second case, a rank-one function is the product of two 4-dimcnsional polynomials 
with total degree p. 
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Figure 7: Diffusion problem with multiple inclusions. 

As in previous examples, we compare approximations obtained with different types of regu- 
larization in the correction step of Algorithm 3. In order to better understand the performance 
of algorithms using different low-rank tensor structures, we determine the relative error for both 
rank-one approximation (table 2) and optimal rank approximation (table 3). 

Table 2: Relative error e{ui,u) x 10^ for rank one approximations for different regularizations. 
Pp corresponds to a polynomial space with total degree p in i dimensions. Results are reported 
for p = 2, 3 and 5, for different sample sizes Q and for different regularizations in the greedy 
correction step: ordinary least square (OLS), ^2-regularization and £i-regularization. 
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2.40 


7^1 (P^"' (g I 


p{4)) 














24.5 


1.15 


2.14 


1.36 



We can derive several conclusions from table 2. We find that, for few samples, the minimum 
error is obtained for approximation spaces with smallest degree. Also, in these spaces, the solution 
obtained with variables regrouping gives a better approximation. As we may have expected, the 
error reduces with increase in sample size for both cases. However, we find that, with variables 
regrouping, OLS and ^2-i'egularizations are unstable for approximation with high degree polyno- 
mials. Indeed for few samples and with variables regrouping, more coefficients are required to be 
evaluated (as compared to low rank-approximation in the finest tensor space). i?i-regularization 
performs better by retaining few significant polynomial basis functions. This observation is partic- 
ularly significant since in actual practice, we do not know in advance the degree of polynomials that 
would give an optimal approximation and high dimensional approximation spaces may be used 
if no adaptive strategy is considered. When we have a sufficient number of samples with respect 
to the number of basis functions in each group, variables regrouping gives better approximation 
than a complete separation (finest tensor structure) whereas complete separation is the most rel- 
evant when having only few samples (see Q = 50 and approximation in TZmiP'^'' (g) . . . ® Ps^'')). 
This observation features the existence of an optimal variables regrouping (or equivalently optimal 
variable separation) with respect to the number of samples and to the approximation space. A 
challenge would be to design automatic procedures for variables regrouping, avoiding algorithms 
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Table 3: Relative error s{umopiU') x 10^ for optimal rank-moj> approximation (with update and 
optimal rank selection using 3-fold cross validation) of multi-inclusion problem with various reg- 

ularizations. 



Approximation 


Q=50 


Q=100 


Q=1000 




OLS £2 ii 


OLS £2 li 


OLS £2 ii 




6 TTtop C TTlop €. TTtop 


6 TTtop ^ TTlop TTlop 


€ TTlop ^ TTlop ^ TTlop 


7^™(P^'' ® ...®p^'') 
7^^(p^^'®pi■*') 

iZmiri^'' (»...(» rip) 
7^,„(p^*'®p^'') 


2.78 1 2.68 8 2.78 1 
1.90 1 1.83 2 1.72 1 

2.94 1 2.85 6 2.79 2 
1230 1 1250 1 18.3 2 

12.208 1 12.40 3 4.42 2 


2.68 1 2.36 10 2.66 1 
0.84 2 0.91 2 0.88 3 

2.96 1 2.81 10 2.67 2 
16.95 1 15.54 3 1.05 2 

3.16 1 3.11 9 2.97 1 
- 24.5 1 


0.55 20 0.53 20 1.98 4 
0.42 15 0.42 15 0.41 2 

0.61 18 0.66 20 1.93 3 
0.26 7 0.25 9 0.22 4 

1.99 4 1.34 20 2.40 1 
0.71 7 0.73 4 0.41 4 



with combinatorial complexity. 

Table 3 shows the effect of greedy construction for both tensor structures. With few exceptions, 
we find that £i-regularization is the best. 

4.3.2 Strategy for the approximation of vector valued functions 

We wish to obtain a reduced model for the random field u{x, ^) e Vn <S) ip^^ (Si) (g) . . . (^d)) 
where Vjv is a A''-dimensional finite element approximation space used for the discretization of the 
partial differential equation. A straightforward application of the previous methodology would 
rcqiiirc to evaluate u at certain number of realizations of the input random vector ^ and to perform 
least-squares approximation for each node of the finite element mesh. However, this may not be 
feasible for large number of degrees of freedom N. Hence we wish to obtain a reasonably accurate 
low rank representation of the random field u(a;,^). For that purpose, we apply an empirical 
Karhunen-Loeve decomposition. 

Let u(^'^) e R^, 5 G {1 . . .Q}, represent the finite element solutions (nodal values) associated 
with the Q realizations of the input random vector ^. We denote by Ug ~ ^ u(^') the 

empirical mean, and by u(^'') = u(^'?) — Uo. We gather the centered realizations in a matrix 
TJ = (u(^-^) . . . u(^'5)). We compute the Singular Value Decomposition (SVD) U = '}2ii=i o'iVi'SiZi, 
where the Vj e and the Zj e MP are respectively the left and right singular vectors of U, 
and the arc the corresponding singular values ordered with decreasing values. Then, this 
decomposition is truncated by retaining only the iV* dominant singular values: U « J2iLi CiVjigjZj. 
It corresponds to the following approximation of the finite element solution (truncated empirical 
Karhunen-Loeve decomposition): 

N* 

U(0 «Uo + ^(TiVi^i(0 (19) 
i=l 

where the Zi{^) are random variables whose evaluations at samples are gathered in the 

vectors Zj. 

This procedure reduces the effective dimensionality of the solution field, which is now a function 
of only N* random variables. Our least-squares-based tensor approximation method can be applied 
to each random variable Zi separately. 

We perform the above procedure for multi-inclusion problem and compare the solution obtained 
for sample sets of three difi:erent sizes {Q = 100,500, 1500). Figure 8 shows the relative error v/s 
the rank of the SVD decompostion of U (with respect to Probenius norm). As can be seen, the 
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error drops significantly after 8 modes. Thus, we choose N* = 8 and we apply the least-squares- 
based tensor approximation method for the random variables {^i(C)}i=ii using Algorithm 3 for 
each input vector independently. We consider the finest tensor structure (gi ■ ■ ■ with 
polynomial spaces of total degree p = b. 




Figure 8: Error of the truncated SVD with respect to the rank for different sample size. 

Figure 9 shows the cross validation error of the low rank approximations of {2^i(0}f=i with 
different sample sizes. As can be expected, the cross validation error reduces with increase in 
sample size and we can select the best low-rank approximation (with optimal rank selection) for 
each mode to construct an approximation of u(^) under the form (19). 
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Figure 9: Evolution of the 3-fold cross validation error of the low rank approximation of the first 
8 random variables {zi{£^)}^^i with the rank (left to right) for sample size lOO(blue), 500(red) and 
1500(green). 



Let / denote the approximation of quantity of interest / (mean of the solution field in a square 
domain defined in section 4.3.1) obtained by the post treatment of the low rank approximation 
of u(^). Figure 10 shows the error e{I,I) on the quantity of interest obtained with the reduced 
model for different sample sizes. The same figure also plots the empirical mean of the quantity 
of interest. We observe that, for few samples (i.e. 100), the mean value estimated from reduced 
model is not good since the corresponding approximation of the stochastic modes are inaccurate. 
However, as we increase the sample size, this strategy gives better approximation of stochastic 
modes and hence more accurate estimation of the quantity of interest. 



Least-squares method for sparse low rank approximation 



18 



10 ■ 



■ Low rank model 
- Empirical mean 



Sample Size (Q) 

Figure 10: Evolution of the error e{I, I) on the quantity of interest with the sample size Q and 
the empirical mean of /. 

5 Conclusion 

A non-intrusive least-squares-based sparse low-rank tensor approximation method has been pro- 
posed for propagation of uncertainty in high dimensional stochastic models. Greedy algorithms 
for low-rank tensor approximation have been combined with sparse least-squares approximation 
methods in order to obtain a robust construction of sparse low-rank tensor approximations in 
high dimensional approximation spaces. The ability of the proposed method to detect and ex- 
ploit low-rank and sparsity was illustrated on two analytical models and on a stochastic partial 
differential equation (diffusion problem). The results have illustrated the significant differences 
between approximations obtained using different tensor structures. It reveals the need to devise 
algorithms that are able to automatically detect optimal low-rank tensor structures and exploit 
at best the few samples available in practical applications in uncertainty propagation. Also, the 
design of sampling strategies that are adapted to the construction of low-rank approximations 
could further improve the performances of these techniques. 
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